Prosper is a peer-to-peer lending marketplace that brings borrowers and investors to a common platform.Prosper uses the borrower’s credit profile, past loan information, income and a variety of other factors to determine a Prosper score, Prosper rating and interest rate. Investors can choose which profiles to lend money to and the amount of money to invest.
This dataset shared by prosper contains the borrower, investor and loan details for about 113937 loans taken during the period 2005 to 2014.
## [1] 113937 81
This dataset has 81 variables.
To understand the dataset better, I want to first take a look at the borrower’s data. What does a borrower’s profile on Prosper look like?
Occupation
Propser website allows borrowers to choose from 67 types of occupations. It is an exhaustive list, yet a lot of users seem to choose “Other” as the occupation. Professionals and computer programmers seem to the top occupations among Prosper borrowers.
Income
Disclosing one’s income is not mandatory on Prosper. But lenders do prefer to invest in loans where borrower’s have shared the income details.
As expected most borrowers are in the income range of $25,000 to $75,000.
Credit Score
Though Prosper does not share details on the algorithms used to determine a borrower’s rate of interest, it is common knowledge that a borrower’s credit score definitely plays a role in the rate of interest.
We should expect this distribution to be almost normal. Bad credit scores mean high risk and great credit scores mean low risk. In a lending marketplace, risk has a direct relation to rate of interest. Lenders will try to maximise their returns while minimizing their risks.
I can see that there are some very low scores, around 400 and some 0s.
Debt To Income Ratio
A lot of investors might use Debt To Income Ratio to estimate the borrower’s capacity to complete payments on the loan.
Most of the borrowers are in the debt to income ratio less than 1, which is a good sign. Changing the scale to logarithmic will allow me to take a closer look at the distriution.
The distribution is positively skewed with a long tail of borrowers towards the right. This shows that most of the borrowers on the platform have a lower debt to income ratio, with less than 1 being the most common. It is not clear at this point if investors prefer borrowers with lower debt to income ratio, but it will become clear in the upcoming sections.
Restricting the y axis to 1 on the box plot, we can see that most of the borrowers have a debt to income ratio between 0.1 to 0.4. There are a lot of outliers with ratios over 0.55.
Prosper Rating
Prosper Rating is calculated by Prosper based on the borrower’s credit profile and other information. This rating indicates a user’s credit worthiness on Prosper and is used by many investors to evaluate and filter investment options.
Prosper introduced the rating concept in 2009. Hence loan data before 2009 contains NA for the prosper rating. Removing the NA from the data, we can see a clearer distribution.
The distribution of Prosper Rarings for borrowers is almost normal, with a peak at C. This again sits well with the assumption that most lenders prefer moderate risks which is offered by borrowers with a medium prosper score.
Reason for borrowing
The borrowers are asked to specify the reason for borrowing money in the Listing Category field. This again provides investors a chance to evaluate the borrower.
Over 50% of the borrowers use Prosper loans to consolidate their debt. Credit card debts, for example, are more expensive and it is wise to use a loan from a peer-to-peer lending company at a competitive rate to pay them off.
Loan Statuses
This is the status of all loans in Prosper for the period 2005 to 2014. As expected most loans have completed or are in current status with payments happening on time.
The numbers do not paint a story, so let’s look at percentages.
Close to 86% of Prosper loans complete on time.
The remaining approximately 14% of loans have a risk of being charged off. Charged off loans remain either fully or partially uncollected. Unfortunately, Prosper data does not contain information about charged off loans recovery, but sources on the internet quote this number at about 16% *.
(* Source: https://www.orchardplatform.com/blog/understanding-loan-statuses/)
First defaults
How long does it take before borrowers start defaulting on their loans? Prosper offers loans for a period of 1, 3 and 5 year terms with most borrowers choosing the 3 year term.
Most of the first defaults happen between months 8 and 22. There are some outliers that default after 40 months of the loan period. But it is clear that borrowers that continue payment for most of their loan term, do tend to pay the loan in full.
Lender Yields
As we can see from the distribution and the bos plot, the lender yields are mostly in the ranfe of 12% to 24%. There are a few outliers in the 45% to 50% range but these are too high to be a repeating phonomena.
Borrower Rate
We can see the same patterns with borrower rate (as lender’s yield). This is makes sense as these 2 variables areclosely related.
Loan Amounts
The borrowing amounts are in the range of $4,000 to $12,000 with the a much lower median amount at approximately $6,000.
Number of investors per loan
Most of the loans have anywhere upto 125 investors and the median number of investors is at about 50. There are loans with over 250 investors but these are far and few.
The prosper loan dataset has 113937 observations of 81 variables.
These 81 variables describe the loan, the borrower’s profile, payments and the lender’s yield and charges.
These are the variables I am interested in for this analysis.
The main features of interest for me in this dataset are to:
Identify what factors influence the Borrower’s rate. What is the impact of ProsperRating, CreditScoreRange and IncomeRange on Borrower’s rate?
What type of loans, credit profiles and prosper ratings do investors look for? How much do they invest in different borrower profiles?
Prosper changed its business model in around 2009. From 2005 to 2008, Prosper operated in an eBay style aution model, where lenders and borrowers determined the lending rate by an auction. In 2009, it moved to a lending rate determined by Prosper based on the borrower’s credit risk and other details. Also, around Novemeber 2008, Prosper was pulled out of business by the SEC and Prosper came back to business after obtaining SEC registration in 2009. I am intersted in investigating what impact these changes had in Prosper?
The Borrower’s rate is designed to compensate the investor for the risk they undertake. Higher risk = higher borrower’s rate. The risk here is mainly that of delinquency. So I want to look at what factors are an indication of an impending delinquency? Can current delinquencies and debt to income ratio help understand if a borrower will default on a loan?
PropserRating is another factor that infuences the Borrower’s rate. Prosper uses Credit Profile information including a borrower’s credit score to arrive at this rate. What is the relationship between Prosper Rating and Credit Score?
What type of loans do lenders want to invest in?
I renamed the varaible ProsperRating (Alpha) to ProsperRatingAlpha for ease of use.
I created a new variable called Delinquent which is set to 1 for loans that are past due, charged off or defaulted and 0 for the others. This variable will be used to analyze Prosper’s loan performance over time.
I created a new variable called CreditLowerBucket which groups the CreditScoreRangeLower into 3 buckets. 720 and above - Low Risk, 600-719 - Medium Risk, Less then 600 - High Risk. This will be used in various plots to see the credit spread of borrowers.
I created a new variable called InvestorCountBucket that groups the number of investors into buckets. 0-100, 101-200, 201-300, greater than 300. This will be used to understand the investment patterns of lenders.
Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?###
I log transformed the right skewed DebtToIncomeRatio distribution. As most of the DebtToIncomeRatio values are concentrated between 0 and 1, the shape of the distribution was not visible a normal scale.
I converted the factor variable LoanOriginationDate into a Date variable so that date operations can be easily performed.
I ordered the factors ProsperRatingAlpha, IncomeRange and LoanOriginationDate in a logical manner.
Income Range - Lowest -> Highest (Ignoring Not Displayed) : “Not employed”, “$0”, “$1-24,999”,“$25,000-49,999”,“$50,000-74,999”,“$75,000-99,999”,“$100,000+”, “Not displayed”
LoanOriginationQuarter - Ordered by Year: Q1 2006“,”Q2 2006“,”Q3 2006“,”Q4 2006“,”Q1 2007“,…
I want to start off the Bivariate Plots with a ggpairs visual of 10 variables that I am most intertested in for this analysis.
A correlation plot shows the relationship between continuous variables in a more readable format.
From the plot, we can see some clear relationships.
The Credit score has an negative correaltion with the rate the borrower pays and the lender’s yield. The correaltion is not very strong at -0.45.
The credit score also seems to have a negative correlation with the number of current delinquencies.
The box plot of income range and debt to income ratio seems to have some pattern.
The last variable that seems to play an important role is Prosper Rating. Prosper rating affects the borrower rate and the lender yield.
I want to start with exploring relationships between some of these variables.
Lender yield and borrower credit score
As expected, the lender yield and the borrower rate decrease as the credit scores improve. Better credit scores offer lower risk and hence lower borrowing rate and lender yield.
Borrower rate and borrower credit score
Current delinquencies and borrower credit score
Borrowers with better credit scores have fewer delinquencies on their loans. Though this does not indicate that the borrower will not fail payments in the future, it is an indication of credit worthiness of the borrower.
As I expected, the number of delinquencies decreses with a better credit score.
Credit score is an important parameter to consider in further analysis.
Debt to income ratio of borrowers across income ranges
The very high debt to income ratios belong to borrowers who are unemployed. Among the borrowers with an income, the debt to income ratio seems well within 1 and decreses as the income increases.
Not employed category has a huge debt to income ratio and that makes the rest of the data hard to read. Omitting the not employed category and limiting the debt to income ratio to 1 will give a better chart.
Prosper rating and loan defaults
A better prosper rating means the borrower might continue paying the loan for a slightly longer period. But this doesn’t give anything interesting.
Prosper rating and borrower rate
Propser uses various factors to arrive at a borrower rate for the loan. My assumption is Prosper rating also uses most of the same factors. Hence there should be relation between Prosper Rating and Borrower Rate.
by(prosper.loan$BorrowerRate,prosper.loan$ProsperRatingAlpha,summary)
## prosper.loan$ProsperRatingAlpha: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04000 0.06990 0.07790 0.07912 0.08450 0.21000
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0498 0.0990 0.1119 0.1129 0.1239 0.2150
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0693 0.1414 0.1509 0.1545 0.1639 0.3500
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0895 0.1765 0.1914 0.1944 0.2099 0.3500
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1157 0.2287 0.2492 0.2464 0.2625 0.3500
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1479 0.2712 0.2925 0.2933 0.3149 0.3600
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1779 0.3134 0.3177 0.3173 0.3177 0.3600
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: NA
## NULL
Indeed, the median borrower rate increases as the Prosper Rating moves from AA to HR.
A prosper borrower with a rating of AA gets a rate that is less than half the mean borrowing rate on Prosper.
Lender yield and prosper rating
Lender’s yield is calculated from Borrower rate, so it is expected that the Lender’s yield will follow the same pattern.
Credit scores and prosper rating
The credit scores are probably one of the factors that determine the prosper rating. There is a decreasing trend in credit scores as we move from prosper rating AA to A.
Contribution Per Investor
Next I want to look at investing patterns of lenders. The lenders can invest anywhere between $25 and $35,000. Each loan can be funded by multiple investors.
The 3rd quartile for all ratings are below $10,000. Rates start at 5.99% for a 3-year AA loan and go up to 31.72% for an HR loan. Loans in Prosper Ratings A,B and C seem to be the most popular among investors. AA rated loans are less popular because of the low rate of return. Loans rated D,E and HR are high risk and hence investors seem to be contributing smaller amount in these loans.
by(prosper.loan$LoanOriginalAmount/prosper.loan$Investors, prosper.loan$ProsperRatingAlpha, summary)
## prosper.loan$ProsperRatingAlpha: AA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 45.45 54.95 2090.28 97.32 30000.00
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.33 59.17 106.38 5092.49 10000.00 35000.00
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.48 70.65 416.67 5813.18 10000.00 35000.00
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.299 84.416 800.000 5422.738 10000.000 25000.000
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 65.22 134.62 1943.06 1428.57 15000.00
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.87 68.97 156.25 1406.34 2000.00 10000.00
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: HR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.85 57.97 100.00 586.64 363.64 12000.00
## --------------------------------------------------------
## prosper.loan$ProsperRatingAlpha: NA
## NULL
Changes in prosper loans over time
When prosper came back to business in 2009, there seems to be a stricter limit on the credit scores allowed to borrow on prosper.
In Q3 2008, Prosper was pulled out of business by the SEC. Around the same time, Prosper also changed its business model. Earlier interest rates on Prosper were determined by an auction mechanism. Prosper introduced Prosper Rating to rank its borrowers and pre-determined the interest rate based on this rating.
What about the borrower profiles on Prosper? Did they change? From this graph above it looks like Prosper enforced stricter credit mechanisms after reopening in 2009. Pre 2009, the credit scores of borrowers in Prosper never averaged to over 700. It went as low as 580 in 2006.
Between 2009 and 2014, the mean credit scores have been in the range of 680 and above. One reason for this could be the borrower rate. Prosper may be charging a higher rate for borrowers with a lower credit score.
Another reason could be practices around non-payment of loans. As per Prosper, delinquent customers take a hit to the credit score and are also not allowed to borrow from Prosper anymore. Though there is no way to validate this, it may have over time reduced the number of borrowers in the less than 600 credit score range.
What about Debt to Income Ratio over time?
As the credit scores of borrowers improved, I was curious to see if the debt to income ratios also improved. Better credit scores do not necessarily mean lower debt to income ratios, so I wanted to see how it played out for Prosper’s borrowers.
Since the ratio seems concentrated at the lower end, I want to zoom in and look at DebtToIncomeRatio upto 1.
There hasnt been much of a variation with Debt to Income Ratio except for a brief spike during Prosper’s initial years.
Loan amount in Prosper
The average amount of money lent to customers has been on the rise since 2009.
There is a drop in 2009 when Prosper reopened for business. It has been on a rise ever since and is probably just a business trend as more people hear about Prosper and invest in prosper loans.
How have these loans performed?
So more money is being invested in Prosper. But what about performance?
To see the trend in loan performance, I created a new variable called “delinquent” that takes 0 and 1 as values. Loans with status Past due by any number of days, charged off or defaulted will be marked 1. The rest of the loans are marked 0.
The percentage of delinquent loans on Prosper has decreased since its opening years. But the drop between 2012 to 2014 may just be because of more current loans during the period.
I wanted to look at numbers instead of percentages to offset the impact of current loans in 2013 and 2014.
The drop in Q2 2009 should be due to Prosper’s shut down in the previous quarters as explained by the steady rise afterwards.
There is a drop towards the beginning of 2013 and it is probably explained by when most borrowers default on their loans.
Most of the first defaults happen between months 6 and 16. This explains the sharp drop in 2012-2013. A reason could be that the loans taken during 2013 are still in the intial phase and defaults haven’t happened yet. Prosper’s shut down in 2008 and a slow rise in loan amount in the immediate period following the reopen, may explain the drop in defaults in 2012-2013.
It would be interesting to see the performance of loans upto 2017, especially since the break in 2009 alters the shape of this graph.
The borrower rate varies with the prosper rating. Borrowers with a poor prosper rating pay more than double the interest rate that a borrower with a good prosper rating pays.
The credit score is one the strongest contributors to the Prosper Rating. A good credit score means a good Prosper Rating.
Prosper’s lending patterns seem to have changed since its change in business model and reopening in 2009. The mean credot score of borrowers has improved since 2009 and the average score has been over 680 since then. This could be due to the change in business model where Prosper determines the rate of interest for a borrower as opposed to an aution model pre-2009. The rate of interest may not be too appealing for users with a poor credit score.
I noticed that the number of current delinquencies is an indicator of credit of a borrower. Borrowers with a good credit score have fewer delinquencies.
I expected to see some variation in Debt to Income ratio over time and with income range. I was surprised to see that neither of this was the case.
The strongest relationship I found was between Prosper Rating and Borrower Rate/Lender Yield.
Continuing the analysis from the previous section, I want to see what credit range these delinquent loans fall into.
What is the credit bucket for the delinquent loans?
What income range do these credit score buckets fall into?
The credit scores are spread across the income ranges and this makes sense because income plays an indirect role on the credit score.
The borrower rate increases with a poor Prosper rating. Prosper did not have these ratings prior to 2009 and those records appear as NA. It is interesting to see that a lot of the high risk loans appear in the NA bucket. This is another indication that there are fewer borrowers in Propser with credit scores less than 600 since 2009.
Credit score and lender yield over time
Since credit scores have a strong relation to prosper rating and hence to the lender yield, I wanted to see how the relationship between credit score and lender yield has evolved over time.
The black line represents the average lender yield for a credit score bucket. The blue line represents the overall average lender yield.
The plots match my expectation.
The high risk loans promise a yield above average and the low risk ones promise a yield below average.
The medium risk loans seem like a good bet, offering returns closer to the average.
The overall returns though seem to be on a downward trend since 2012.
Lender yield and Prosper score
How about lender yield and prosper score. I expected them to show a similar relation.
This shows a similar relation as the credit scores and lender yield. The yield increases with investment in a poorer prosper rating.
There is not a lot of variation in AA and A rated loans from 2009 to 2014. The B, c, D and E rated loans have seem a dip in the lender yields from 2012. The HR rated loans also yield lesser returns since 2011 and interestingly the variation in rates within HR catgory has dropped siginificantly compared to 2009.
Lender yield, prosper rating and credit score
Within a prosper rating, the average lender yield does not vary much. No matter what the borrower’s credit score is, a prosper rating of E gives an average lender yield of approximately 20%.
Within the same credit score bucket, a poorer prosper rating gives better yields. A credit score of 720 and above in AA rating yields about 5% return but the same credit score in E yields about 25%.
This is another indication that prosper’s rating is influenced by a lot more factors, though credit score seems to be one of the contributors.
Investment Patterns
Do investors look for a safety net when investing in high risk loans? Based on the credit rating and prosper score for a borrower, I wanted to analyse if there are more number of investors for poorly rated loans. In other words, do investors try to reduce their risk by reducing the amount they invest in poorly rated loans?
The NA rating is not adding much value, so filtering it will give me a better picture.
The investment patterns seem spread out. The density of green dots in the AA rating indicates that there are more investors per AA rated loan. Investors want to spread their funds across loans. AA loans are safer but the returns are very low. My original thought process was that investors will prefer to invest smaller amounts in loans rated D,E and HR. But it seems like a lot of investors want to spread their investments across different ratings.
The mean percentage funding for Prosper loans is approximately 99% which means that most of the loans get close to 100% funding. So investment patterns suggest that more investors are willing to put in more funds in the B,C and D category of loans.
** Borrower rate, prosper rating and credit score**
The borrower rate is highly dependent on the Credit score and the Prosper Rating of the borrower. There are no borrowers with a credit score less than 600 in AA and A rated loans. Borrowers with credit scores greater than 720 are spread from prosper rating AA to E, indicating that credit score is just one of the factors that affects prosper rating.
The borrower rate steadily grows from AA to HR.
The variation in borrwer’s rate within a prosper rating is very minimal.
This part of the investigation strengthened the impact of Prosper Rating on the borrower rate. Proper Rating seems to be the factor that influences the borrower rate the most.
The other interesting observation is the almost non-existent borrowers in credit score lower than 600 since 2009. Most borrowers seem to be in the credit score range 600 to 720.
Across credit score ranges, the borrower rate for a Prosper Rating seems to be constant. I expected the Prosper Ratings to move in the same direction as the credit scores. But that does not seem to be the case and Prosper seems to use a combination of other factors to arrive at the rating.
Loans with a Prosper Rating A,B and C are the most popular among investors. AA rated loans, do not see as much investments inspite of being the best rated. This may be due to the low lender yield on AA loans. This gives us insight into the lending patterns on Prosper. Most lenders want to maximise their returns and are willing to take moderate risks.
This graph shows what changed in Prosper when it reopened business in 2009. Along with the change in business model a key improvement seems to be a better borrower profile. The mean credit score for borrowers slowly increased since 2008. As of Q1 2014, Prosper has an average borrower credit score of 690 compared to 590 in Q4 2006.
This one sums up the findings well. Prosper Rating matters the most when determining borrower rate. Credit scores are one of the main parameters used to determine the prosper rating. Prosper Ratings A, B, C and D provide good returns to investors with varying credit profiles and are hence the most popular options amoung investors.
I picked up the Prosper dataset as Peer-to-Peer lending concept is a new domain to me. The dataset with its 81 variables was daunting at first. I started with too many variables and quickly found myself overwhelmed and my analysis directionless. But after visualizing a few patterns and limiting the list of variables, the data started to make sense.
Prosper Rating seems to be one of the most important factors for borrower rate and lender yield. With Prosper’s change in business model the loans seem to have become unattractive for borrowers with a credot score less than 600. Since its reopening in 2009, the loan amount has been on the rise every quarter. The number of delinquent loans seems to have dropped but the data stops at 2014 and the trend may not be meaningful for the availale period. Investors seem to prefer the medium risk loans in the prosper rating A to C category as they offer better returns with a medium risk of chargeoffs.
I would have liked to dig deeper on the charged off loans and understand how many of these are recovered and how much money investors lost in delinquent loans. I could not find variables in the dataset that will help me analyze the recovery made on charged off loans. This will be an interesting analysis for me if the data becomes available in the future.
Borrowers data did not contain age and gender. Analyzing the loan performance by these factors can also be an interesting project.
The loan statuses show a real trend only after 2 to 3 years from origination. The data from 2009 to 2013 is interesting and is showing some trends in decreasing lender yields. To get more insight into loan performances and delinquencies, more recent data would be required. This again would make a nice investigation for future.